<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>The AnnotationSketch module</title>
<link rel="stylesheet" type="text/css" href="style.css">
</head>
<body>
<div id="menu">
<ul>
<li><a href="index.html">Overview</a></li>
<li><a href="https://github.com/genometools/genometools/releases">Releases</a></li>
<li><a href="pub/">Archive</a></li>
<li><a href="https://github.com/genometools/genometools">Browse source</a></li>
<li><a href="http://github.com/genometools/genometools/issues/">Issue tracker</a></li>
<li><a href="documentation.html">Documentation</a></li>
<li><a id="current" href="annotationsketch.html"><tt>AnnotationSketch</tt></a></li>
  <ul class="submenu">
    <li><a href="annotationsketch.html#collapsing">Collapsing</a></li>
    <li><a href="annotationsketch.html#styles">Styles</a></li>
    <li><a href="trackselectors.html">Track assignment</a></li>
    <li><a href="customtracks.html">Custom tracks</a></li>
    <li><a href="annotationsketch.html#gtsketch">The <tt>gt sketch</tt> tool</a></li>
    <li><a href="examples.html">Code examples</a></li>
    <li><a href="libgenometools.html">API reference</a></li>
  </ul>
<li><a href="/cgi-bin/gff3validator.cgi">GFF3 validator</a></li>
<li><a href="license.html">License</a></li>
</ul>
</div>


<div id="main">
<h1>The <tt>AnnotationSketch</tt> annotation drawing library</h1>

<p>
The <em>AnnotationSketch</em>
module is a versatile and efficient C-based drawing library for GFF3-compatible genomic annotations. It is included in the <em>GenomeTools</em> <a href="http://genometools.org/pub">distribution</a>. Additionally, bindings to the <a href="http://www.lua.org">Lua</a>, <a href="http://www.python.org">Python</a> and <a href="http://www.ruby-lang.org/en">Ruby</a> programming languages are provided.
<h2>Contents</h2>
<ul>
 <li><a href="#overview">Overview</a></li>
 <li><a href="#collapsing">Collapsing</a></li>
 <li><a href="#styles">Styles</a></li>
 <li><a href="#trackselectors">Track assignment</a></li>
 <li><a href="#customtracks">Custom tracks</a></li>
 <li><a href="#gtsketch">The <tt>gt sketch</tt> tool</a></li>
 <li><a href="#examples">Code examples</a></li>
 <li><a href="#api">API Reference</a></li>
</ul>
</p>
<h2>Overview</h2>
<a name="overview"></a>
<p>
<em>AnnotationSketch</em> consists of several classes (see Fig. <a href="#fig1">1</a>), which take part in three visualization <em>phases</em>.
</p>
<div class="figure">
  <p><a name="fig1"></a><img src="images/dataflow.png" alt="[Dataflow]"></p>
  <p><b>Figure 1: </b>Schematic of the <em>AnnotationSketch</em> classes involved in image creation. Interfaces are printed in italics. </p>
</div>


<h3>Phase 1: Feature selection</h3>
<p>
The GFF3 input data are parsed into a directed acyclic graph (<em>annotation graph</em>, see Fig. <a href="#fig2">2</a> for an example) whose nodes correspond to single features (i.e. lines from the GFF3 file). Consequently, edges in the graph represent the <em>part-of</em> relationships between groups of genomic features according to the <a href="http://www.sequenceontology.org/">Sequence Ontology</a> hierarchy. Note that GFF3 input files <em>must</em> be valid according to the <a href="http://www.sequenceontology.org/gff3.shtml">GFF3 specification</a> to ensure that they can be read for <em>AnnotationSketch</em> drawing or any other kind of manipulation using <em>GenomeTools</em>. A validating GFF3 parser is available in <em>GenomeTools</em> (and can be run using <tt>gt gff3validator</tt>).
</p>
<div class="figure">
  <p><a name="fig2"></a><img src="images/gfftree.png" alt="[GFF3 tree]"></p>
  <p><b>Figure 2: </b>Example sequence region containing two genes in an annotation graph depicting the <em>part-of</em> relationships between their components.</p>
</div>
<p>
Each top-level node (which is a node without a parent) is then registered into a persistent <em>FeatureIndex</em> object. The <em>FeatureIndex</em> holds a collection of the top-level nodes of all features in each sequence region in an interval tree data structure that can be efficiently queried for features in a genomic region of interest. All child nodes of the top-level node are then available by the use of traversal functions.
</p>
<p>
Alternatively, annotation graphs can be built by the user by creating each node explicitly and then connecting the nodes in a way such that the relationships are reflected in the graph structure (see <a href="examples.html">example section</a> for example code).
</p>
<h3>Phase 2: Layout</h3>
<p>
The next step consists of processing the features (given via a <em>FeatureIndex</em> or a simple array of top level nodes) into a structural <em>Diagram</em> object which represents a single view of the annotations of a genomic region. First, semantic units are formed from the annotation subgraphs. This is done by building <em>blocks</em> out of connected features by grouping and overlaying them according to several user-defined collapsing options (see <a href="#collapsing">Collapsing</a>). By default, a separate <em>track</em> is then created for each Sequence Ontology feature type. Alternatively, if more granularity in track assignment is desired, <a href="trackselectors.html"><em>track selector functions</em></a> can be used to create tracks and assign blocks to them based on arbitrary feature characteristics. This is simply done by creating a unique identifier string per track. The <em>Diagram</em> object can also be used to hold one or more <a href="customtracks.html"><em>custom tracks</em></a>, which allow users to develop their own graphical representations as plugins.

The <em>Diagram</em> is then prepared for image output by calculating a compact <em>Layout</em> in which the <em>Block</em> objects in a track are distributed into <em>Line</em> objects, each containing non-overlapping blocks (see Fig. <a href="fig3">3</a>). The overall layout calculated this way tries to keep lines as compact as possible, minimising the amount of vertical space used. How new <em>Lines</em> are created depends on the chosen implementation of the <em>LineBreaker</em> interface, by default a <em>Block</em> is pushed into a new <em>Line</em> when either the <em>Block</em> or its caption overlaps with another one.
</p>

<div class="figure">
  <p><a name="fig3"></a><img src="images/diagram.png" alt="[Diagram]"></p>
  <p><b>Figure 3: </b>The components of the <em>Layout</em> class reflect sections of the resulting image.</p>
</div>
<h3>Phase 3: Rendering</h3>
<p>
In the final phase, the <em>Layout</em> object is used as a blueprint to create an
image of a given type and size, considering user-defined options.

The rendering process is invoked by calling the <em>sketch()</em> method of a <em>Layout</em> object. All rendering logic is implemented in classes implementing the <em>Canvas</em> interface, whose methods are called during traversal of the <em>Layout</em> members. It encapsulates the state of a drawing and works independently of the chosen rendering back-end. Instead, rendering backend-dependent subclasses of <em>Canvas</em> are closely tied to a specific implementation of the <em>Graphics</em> interface, which provides methods to draw a number of primitives to a drawing surface abstraction. It wraps around the respective low-level graphics engine and allows for its easy extension or replacement.<br>
 Currently, there is a <em>Graphics</em> implementation for the <a href="http://cairographics.org">Cairo</a> 2D graphics library (<em>GraphicsCairo</em>) and two <em>Canvas</em> subclasses providing access to the image file formats supported by Cairo (<em>CanvasCairoFile</em>) and to arbitrary Cairo contexts (<em>CanvasCairoContext</em>, which directly accesses a <a href="http://www.cairographics.org/manual/cairo-context.html"><tt>cairo_t</tt></a>). This class can be used, for example, to directly draw <em>AnnotationSketch</em> output in any graphical environment which is supported by Cairo (see <a href="http://www.cairographics.org/manual/cairo-surfaces.html"> list of supported surface types</a>).
</p>

<a name="collapsing"></a>
<h3>Collapsing</h3>
<p>
By default, <em>Lines</em> are grouped by the Sequence Ontology type associated with the top-level elements of their <em>Blocks</em>, resulting in one track per type.
To obtain a shorter yet concise output, tracks for parent types in the feature graph can be enabled to contain all the features of their child types. The features with the given type are then drawn on top of their parent features (e.g. all <em>exon</em> and <em>intron</em> features are placed into their parent <em>mRNA</em> or <em>gene</em> track). This process is called <em>collapsing</em>. Collapsing can be enabled by setting the <tt>collapse_to_parent</tt> option for the respective child type to <tt>true</tt>, e.g. the following options:
</p>
<pre class="code">
style = {
  exon = {
    ...,
    collapse_to_parent = true,
    ...,
  },
  intron = {
    ...,
    collapse_to_parent = true,
    ...,
  },
  CDS = {
    ...,
    collapse_to_parent = true,
    ...,
  },
}
</pre>
would lead to all features of the <em>exon</em>, <em>intron</em> and <em>CDS</em> types collapsing into the <em>mRNA</em> track (see Fig. <a href="#fig4">4</a> and <a href="#fig5">5</a>).
<div class="figure">
<div class="subfigure">
<p><b>Figure 4:</b> Schematic of the relationships between the <em>gene</em>, <em>mRNA</em>, <em>exon</em>, <em>intron</em> and <em>CDS</em> types and the colors of their representations in a diagram. The arrows illustrate how the relationships influence the collapsing process if collapsing is enabled for the <em>exon</em>, <em>intron</em> and <em>CDS</em> types. In this example, they will be drawn on top of their parent <em>mRNA</em> features.</p>
</div>
<div class="subfigure">
<p><a name="fig4"></a><img src="images/collapse_types.png" alt="[collapsing]"></p>
</div>
<div style="clear:both;"></div>
</div>
<div class="figure">
<p><a name="fig5"></a><a style="background-color: white; border: none;" href="images/cnn_large.png"><img src="images/collapsing.png" alt="[collapsed/uncollapsed views]"></a></p>
<p><b>Figure 5:</b>(click to enlarge) Example image of the <em>cnn</em> and <em>cbs</em> genes from <em>Drosophila melanogaster</em> (<a href="http://www.ensembl.org/Drosophila_melanogaster/contigview?region=2R&amp;vc_start=9326816&amp;vc_end=9341000">Ensembl release 51, positions 9326816--9341000 on chromosome arm 2R</a>) as drawn by <em>AnnotationSketch</em>. At the bottom, the calculated GC content of the respective sequence is drawn via a custom track attached to the diagram. (a) shows a collapsed view in which all <em>exon</em>, <em>intron</em> and <em>CDS</em> types are collapsed into their parent type's track. In contrast, (b) shows the <em>cbs</em> gene with all collapsing options set to <tt>false</tt>, resulting in each type being drawn in its own track.</p>
</div>

<a name="styles"></a>
<h3> Styles</h3>
<p>
The <a href="http://www.lua.org/">Lua</a> scripting language is used to provide
user-defined settings. Settings can be imported from a script that is executed
when loaded, thus eliminating the need for another parser. The Lua configuration
data are made accessible to C via the <em>Style</em> class. Configurable options
include assignment of display styles to each feature type, spacer and margin
sizes, and collapsing parameters.
</p>
<p>Instead of giving direct values, callback Lua functions can be used in some options to generate feature-dependent configuration settings at run-time. During layout and/or rendering, the <a href="http://genometools.org/docs.html#GenomeNode"><em>GenomeNode</em></a> object for the feature to be rendered is passed to the callback function which can then be evaluated and the appropriate type can be returned.</p>
<p>For example, setting the following options in the style file (or via the Lua bindings):</p>
<pre class="code">
style = {
  ...,
  mRNA = {
    block_caption      = function(gn)
                           rng = gn:get_range()
                           return string.format("%s/%s (%dbp, %d exons)",
                                 gn:get_attribute("Parent"),
                                 gn:get_attribute("ID"),
                                 rng:get_end() - rng:get_start() + 1,
                                 #(gn:get_exons()))
                         end,
    ...
  },

  exon = {
    -- Color definitions
    fill               = function(gn)
                           if gn:get_score() then
                             aval = gn:get_score()*1.0
                           else
                             aval = 0.0
                           end
                           return {red=1.0, green=0.0, blue=0.0, alpha=aval}
                         end,
    ...
  },
  ...
}
</pre>
<p>will result in a changed rendering (see Fig. <a href="#fig_callbacks">6</a>). The <tt>block_caption</tt> function overrides the default block naming scheme, allowing to set custom captions to each block depending on feature properties. Color definitions such as the <tt>fill</tt> setting for a feature's fill color can also be individually styled using callbacks. In this case, the color intensity is shaded by the <em>exon</em> feature's score value (e.g. given in a <a href="annotationsketch/callback_examples_with_score.gff3">GFF file</a>).</p>
<div class="figure">
  <p><a name="fig_callbacks"></a><img src="images/callbacks.png" alt="[Example rendering with callback functions]"></p>
  <p><b>Figure 6: </b>Example rendering of a <a href="annotationsketch/callback_examples_with_score.gff3">GFF file</a> using callback functions to enable custom block captions and score-dependent shading of exon features.</p>
</div>
<p>An overview of the keywords used in a style file is <a href="style_options.html">available</a>.</p>

<a name="trackselectors"></a>
<h2>Track assignment</h2>
Tracks are normally created for each annotation source and/or Sequence Ontology type encountered in the annotation graph. More control is possible using <em>track selector functions</em> (<a href="trackselectors.html">read more</a>).

<a name="customtracks"></a>
<h2>Custom tracks</h2>
Even more customisability is possible using <em>custom tracks</em>. Custom tracks are classes which contain custom drawing functionality, making it easy to add user-defined graphical content to any <em>AnnotationSketch</em> drawing (<a href="customtracks.html">read more</a>).

<a name="gtsketch"></a>
<h2>The <tt>gt sketch</tt> tool</h2>

The <em>GenomeTools</em> <tt>gt</tt> executable provides a new tool which uses the <em>AnnotationSketch</em> library to create a drawing in PNG, PDF, PostScript or SVG format from GFF3 annotations. The annotations can be given by supplying one or more file names as command line arguments:
<pre>
$ gt sketch output.png annotation.gff3
$
</pre>
or by receiving GFF3 data via the standard input, here prepared by the <tt>gt gff3</tt> tool:
<pre>
$ gt gff3 -sort -addintrons annotation.gff3 | gt sketch output.png
$
</pre>
The region to create a diagram for can be specified in detail by using the <tt>-seqid</tt>, <tt>-start</tt> and <tt>-end</tt> parameters. For example, if the <em>D. melanogaster</em> gene annotation is given as a <tt>dmel_annotation.gff3</tt>, use
<pre>
$ gt sketch -format pdf -seqid 2R -start 9326816 -end 9332879 output.pdf dmel_annotation.gff3
$
</pre>
to plot the gene from the <a href="http://flybase.org/cgi-bin/gbrowse/dmel/"><em>FlyBase</em> default view</a> in PDF format.
The <tt>-force</tt> switch is used to force overwriting of an already existing output file. The <tt>-pipe</tt> option additionally allows passing the GFF3 input through the sketch tool via the standard output, allowing the intermediate visualisation of results in a longer pipeline of connected GFF3 tools. More command line options are available; their documentation can be viewed using the <tt>-help</tt> option.

If an input file is not plotted due to parsing errors, <em>GenomeTools</em> includes a strict GFF3 validator tool to check whether the input file is in valid GFF3 format. Simply run a command like the following:
<pre>
$ gt gff3validator input_file.gff3
input is valid GFF3
$
</pre>
This validator also allows one to check the SO types occurring in a GFF3 file against a given OBO ontology file. This checking can be enabled by specifying the file as an argument to the <tt>-typecheck</tt> option.
<p>
If the PDF, SVG and/or PostScript output format options are not available in the <tt>gt</tt> binary, the most likely cause is that PDF, SVG and/or PostScript support is disabled in your local <em>Cairo</em> headers and thus also not available in your local <em>Cairo</em> library. This issue is not directly related to <em>AnnotationSketch</em> and can be resolved by recompiling the <em>Cairo</em> library with the proper backend support enabled.
</p>
<a name="examples"></a>
<h2>Code examples</h2>
Besides the native C programming interface, <em>AnnotationSketch</em> is usable from a variety of programming languages. Code examples for the individual languages can be found <a href="examples.html">here</a>.

<a name="api"></a>
<h2>API Reference</h2>
A function reference for the <em>AnnotationSketch</em> classes can be found in
the <em>GenomeTools</em> <a href="libgenometools.html">C API reference</a>.

<!--
<a name="users"></a>
<h2>Current users</h2>
TBD
-->

<div id="footer">
Copyright &copy; 2007-2023 The <i>GenomeTools</i> authors. Last update: 2011-02-11
</div>
</div>
</body>
</html>
